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(57) ABSTRACT 

A technique that partially encrypts tokenized documents is 
disclosed. An electronic document image is generated from 
the document. A plurality of tokens are stored as a 
dictionary, wherein the tokens represent shapes contained in 
the document. A plurality of triples comprising a token 
identification (ID) and a corresponding position are gener- 
ated from the document image, such that the token ID 
identifies a token from the dictionary that corresponds to a 
shape in the document image at the corresponding position. 
The token IDs are encrypted. The output representation for 
the secured document comprises encrypted token IDs, posi- 
tions and a dictionary of tokens. Encoding techniques that 
reduce the size of the secured document are also disclosed. 
A trusted image output terminal, for use in document 
reconstruction, includes a single integrated circuit that per- 
forms the decrypting function and the page rendering func- 
tion to significantly reduce the ability to capture the elec- 
tronic document in the clear. 
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METHODS AND APPARATUS FOR PARTIAL 
ENCRYPTION OF TOKENIZED 
DOCUMENTS 

BACKGROUND OF THE INVENTION 5 

1. Field of the Invention 

The present invention is directed toward the field of 
controlling access to electronic documents, and more par- 
ticularly toward the field of securing information through 
partial encryption of tokenized documents. 10 

2. Art Background 

Information contained in documents is a valuable com- 
modity in society. The ability to control distribution of 
documents is partially dependent upon the ownership of 15 
intellectual property rights in the documents. One way of 
generating revenue from the ownership of documents is 
through the sale or license of the documents. In an outright 
sale, a user may purchase from an information provider all 
rights associated with information contained in the docu- 2 rj 
mem. As an owner of the rights in the document, the user is 
permitted to copy, distribute, or re-use the information in any 
form. However, the outright sale of rights in the document 
may demand a high price. Furthermore, the information 
provider may not desire to sell all rights in the document. 2 $ 
Thus, a purchaser of documents may only want to buy 
limited rights in documents (i.e., obtain a license for limited 
rights in the document). One licensing scheme may set a 
license fee for the document dependent upon the rights 
conferred to the licensee (i.e., the purchaser of limited rights 30 
in the document). For example, a user may purchase from an 
information provider the right to use a single copy of the 
document. Having only the right to posses a single copy of 
the document, the user does not have a right to copy the 
document or re-use portions of the document. Accordingly, 35 
it is desirable for an information provider to control the use 
of a document by a user based on the rights conferred to the 
user. 

Currently, information and documents are most easily 
distributed in an electronic form. Electronic documents 40 
provide fast, economic and efficient distribution of informa- 
tion not known through conventional techniques of distrib- 
uting paper or hard copies. However, the distribution of 
electronic copies of documents significantly increases the 
issues of access control. Once the user has an electronic 45 
document in the clear, the user can easily copy the 
document, distribute the document, and re -use all or portions 
of the document to create an additional document. To 
prevent unauthorized use of an electronic document, infor- 
mation providers may encrypt the electronic version of the 50 
document. Encryption, although effective, may require sub- 
stantial processing resources, and thus encumber the effec- 
tive use of the electronic document. Accordingly, it is 
desirable to develop techniques that provide an adequate 
level of security while minimizing the processing required to 55 
recover the document. 

SUMMARY OF THE INVENTION 

A technique that partially encrypts tokenized documents 
generates secure documents while minimizing the process- 60 
ing required to reconstruct the document. An electronic 
document image is generated from the document. A plurality 
of tokens are stored as a dictionary, wherein the tokens 
represent shapes contained in the document. A plurality of 
triples comprising a token identification (ID) and a corre- 65 
sponding position are generated from the document image, 
such that the token ID identifies a token from the dictionary 
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that corresponds to a shape in the document image at the 
corresponding position. To secure the contents of the 
document, the token IDs are encrypted. The secured docu- 
ment comprises encoded positions (x, y), encrypted token 
IDs and a dictionary of tokens. 

In one embodiment, to reduce the size of the secured 
document, the positions in the sequence of triples are 
encoded. For this embodiment, the positions are differen- 
tially encoded, so as to only store information that identifies 
a position relative to a proceeding position. In another 
embodiment, each triple in the sequence is augmented with 
a color designation to include color in the secured document. 
If desired, con tone color is included in the output represen- 
tation by augmenting each triple in the sequence with JPEG 
compressed images. 

Document reconstruction in the partial encryption of 
tokenized document technique includes receiving the dic- 
tionary of tokens, encrypted token identification (IDs) and 
their corresponding positions in the document. The 
encrypted token IDs are decrypted to generate token IDs. 
The original document image is then reconstructed from the 
token IDs and positions by generating a shape corresponding 
to a shape in the dictionary identified by a token ID at a 
location identified by a. corresponding position. In one 
embodiment, a trusted image output terminal includes a 
single integrated circuit that performs the decrypting func- 
tion and the page rendering function to significantly reduce 
the ability to capture signals of the electronic document in 
the clear. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram illustrating one embodiment for 
partial encryption of tokenized documents. 

FIG. 2 is a flow diagram illustrating one embodiment for 
document generation using the partial encryption of token- 
ized document techniques of the present invention. 

FIG. 3 illustrates a network for transmission of a secured 
document for use with the present invention. 

FIG. 4 is a block diagram illustrating one embodiment for 
a document reconstruction system for use with the partial 
encryption of tokenized documents technique. 

FIG. 5 is a flow diagram illustrating one embodiment for 
document reconstruction with partially encrypted tokenized 
documents. 

FIG. 6 is a block diagram illustrating one embodiment for 
a trusted image output terminal for use with the partial 
encrypting of tokenized documents technique. 

FIG. 7 illustrates a high level block diagram of a general 
purpose computer system in which the partial encryption of 
tokenized documents technique of the present invention may 
be implemented. 

DETAILED DESCRIPTION 

FIG. 1 is a block diagram illustrating one embodiment for 
partial encryption of tokenized documents. A document 
generation system 100 processes document images 110 to 
generate secured documents 180. Document images 110 
represent a broad category of information residing as elec- 
tronic images. The document images 110 may include a 
compilation of information from any source. For example, 
the document images 110 may be information stored on a 
computer system as electronic images. Also, the document 
images 110 may be accessed via a network, and stored at one 
or more remote locations. The content of document images 
110 may include articles, books, periodicals, etc. The docu- 
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meat images 110 may be stored in any well known format dictionaries 120. For each token, token,,, a triple (x, y, token 

for storing electronic documents. ID)„, is generated as shown in block 210. The (x, y) 

For the embodiment shown in FIG. 1, the document identifies the position, and the token ID identifies the cor- 

images 110 are input to a compressor 130. In general, the responding shape at that position. As shown in block 220, 

compressor 130 receives, as input, the document images 5 the stream of triples (x y, token ID)„, is separated into two 

110, and generates a compressed version of the document streams: positions (x, y) and token IDs„. The stream of 

images 110 (i.e, reduces the number of bits required to ,ok ° n ,D * are encrypted (block 230). 

represent the document image). For this embodiment, the ^ differential encoding of the positron block improves 

document generation system 100 utilizes a token based <»»'P"f" by uuhzmg the fact that the token positions are 

document image compression scheme. 10 not random f v distributed. Instead, in one embodiment, the 

, , ; i i - token positions are organized into lines of text. For this 

In general, token based document image compression embodiment) the differential encoder 160 (FIG. 1) encodes 

involves processing the document images 110, which rep- lhe lokcn itioQS iQ generaUy lhe order m which the 

resent pages of a document, by searching the document document ^ m rcad (i Une 5 Hne from left t0 

images for distinctive shapes. For example, the letter T , rf ht)< Xo ^p^m this embodiment, the triples are sorted 

appearing m a particular font and size, has a distinctive ^to a aJ approximation of text ^ orgaaized from 

shape Hie shape of the letter T occurs multiple times t to . bottom> For each text line> the ^ les ^ mX left „ 

throughout the document. For this example, the letter «T is {0 _ d hl ^ disaisscd below , the encodi reflects the 

stored as a shape or token in a collection of dictionaries. For betweeQ the ious token , s ri ^ t . hand ^ and the cur . 

each shape or token encountered in a document, the corre- fent token>s left . hand ^ 

sponding shape or token is stored in a dictionary. After zu „ iL ... . - ™^ - . . ,. 

r . t . . • * J* *j i l. For the embodiment of FIG. 2, differential encoding is 

compression, the dictionaries contain the individual shapes , , Al _ _. A , _ . 4 . .f. 

,i t . \ , . i i . . xt * *u * used to further reduce the size of positions in the position 

or tokens that make up the document images. Note that the . r , u „ r . - , iL ^ 

. 4 . c , I j ,i « , block. For each x position of a triple, the differential 

dictionaries are compilations of shapes only, and they do not j j lL j-rr c « » r 

, , uu j • j- encoder 160 encodes the difference from the x position of 

reveal how the shapes are arranged in a corresponding 4 , t . , t , ■ • i r. L 

, . • ■ j . * . t f nn 4l _ B 9 c the previous triple minus the width in pixels of the previous 

document. In the document generation system 100, the ^ , . ; , 4 , -f . . f4L , . r , c 

•i . . r , i r j . ■ ha triple s token. For example, if the horizontal or x position for 

compilation of shapes and tokens for document images 110 r ... , t , . irk « r 4 . . i i- . lL 

r . . . i-« a position block is 100, the token ID corresponding to the 

is represented as dictionaries 120. r . , . n . . . , , ... r . . 

r position x„ is 8 pixels in length, and there are two additional 

Hie compressor 130 generates a position block for each spaces between where position and the proceedmg image> 

page of the document image. In general, the position block 3o then differentiaUy encoded Xn ^ stored as "2". 
identifies, for each page, locations and the corresponding Specifically, for each position, position x„, and a subsequent 
shape or token that appears in the corresponding position. In position in a single row> a difference betW een the prior 
the document generation system 100, the position block is position and (he subsequeD t position is calculated to gener- 
labeled position block 140, and consists of a stream of a te the differentiaUy encoded position, position x N . As shown 
triples, (x, y, token ID). 35 m Wocks 250 and 25 5f position x„ is calculated by subtract- 
In one embodiment, the document generation system 100 mg "x» f rom the current position X M , and updating x to be 
employs a low level encoding scheme to further reduce the the current position. To differentially encode the vertical 
size of the document image. The bit level encoding scheme position y, entitled position y n , the position for the current 
is shown in FIG. 1 as encoding 135. In one embodiment, row jg subtracted from the current character, thereby storing 
encoding 135 employs a Huffman encoding or compression 4Q on i y tne difference. This expression is shown in block 260 
scheme. A bit level encoding scheme, such as Huffman 0 f fig. 2. For example, if the vertical position for position 
compression, is well known in the art and will not be y „ is 50, and the current base y coordinate is 50, then the 
described further. position y„ is 0 (i.e., there is no difference between the 
As shown in FIG. 1, the token ID portion of the position vertical positions of this token ID and the current row). The 
block (encoded) is input to encryptor 150, and the position 45 algorithm for differentially encoding the position portion of 
portion (i.e., x, y) is input to differential encoder 160. The the position block is shown in blocks 240, 250, 255, 260, 
encryptor 150 encrypts the stream of token IDs for each 265, 270, 280 and 285 of FIG. 2. 

position block. Any encryption technique may be used to As shown in block 290 of FIG. 2, the secured document, 

encrypt the stream of token IDs. For example, encryptor 150 which includes the encrypted token IDs and differentially 

may employ a private key cryptography scheme or a public 50 encoded positions, are assembled to create, in part, the 

key cryptography scheme. Various techniques for both pub- secured document. The encrypted token IDs and positions 

lie key cryptography and private key cryptography are well along with the compilation of dictionaries, are now available 

known in the art and will not be described further. f or use as a secured document. For example, the secured 

The position portion (i.e, x, y) of position block 140 is document may be transmitted to a designated recipient 

encoded in differential encoder 160 to reduce the size of the 55 (block 295). 

position block. Also, bit level encoding is performed in Iq one embodiment, the partial encryption of tokenized 

encoding 137 on the differentially encoded positions (x, y). documents technique includes documents with color. For 

For transmission, the secured document consists of the this embodiment, color information is included in the 

encoded positions (x, y) and encrypted token IDs. The secured document 180 by labeling each position block (x, y), 

compilation of shapes for dictionaries 120 are also trans- 60 with the corresponding color. In the document reconstruc- 

mitted with the document, and thus are part of secured tion system (FIG. 5), the color information for a position is 

document 180 as well. extracted to generate color at the corresponding position on 

FIG. 2 is a flow diagram illustrating one embodiment for a document image. Contone color information is included by 

document generation using the partial encryption of token- adding to a document page JPEG-compressed images. The 

ized document techniques of the present invention. As 65 contone color is drawn as background to the document 

shown in block 200, the document images are parsed into image page. Alternatively, the contone color information 

tokens or shapes per the compilation of shapes in the may be clipped by a mask during document reconstruction. 
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The partial encryption of token based documents tech- reconstruct some of the text of the document. Accordingly, 

nique of the present invention is an efficient technique that this technique may not be the preferred technique for highly 

provides relatively high security. The technique, which only sensitive and confidential documents. However, for most 

encrypts the sequence of token IDs and leaves the encoded applications involving documents whose content is not 

positions, token dictionaries and color information in the 5 extremely sensitive, the techniques of the present invention 

clear, substantially reduces the amount of processing to provide adequate protection. 

recover a secured document from a technique that encrypts The color information, if present in the secured document, 
the entire position block. Typically, the token ID stream is also not very useful to the untrusted user in determining 
accounts for approximately 10 to 30% of the bits in the the content of the document. In general, the color inform a - 
document. Accordingly, there is, on average, a five-fold no tion only discloses that some sequence of tokens occurred in 
reduction in the number of bits that must be encrypted to a particular color. The color information is sent in the clear 
generate a document and that must be decrypted to recover (i.e., the color information in the position block is not 
a document. This efficiency is on top of the reduction in the encrypted). Since the color information is available to the 
document image file size obtained by using a token-based trusted printer (FIG. 6) prior to decryption processing, the 
document representation. Typically, a token-based docu- is untrusted component of the printer performs pre-processing 
ment representation reduces the file size between three to ten steps using the color information. For example, the untrusted 
times the original size. Accordingly, overall, the partially component of the printer may pre-process trapping or line 
encrypted token based document compression has approxi- thickening using the color and shape information. This 
mately thirty times fewer encrypted bits than the number of further reduces the amount of processing required in the 
bits that would be required to fully encrypt the original 20 "trusted component" portion of the printer, thereby reducing 
document images. the ability to compromise the secured document. 
Security of Partial Encryption of Tokenized Documents: The contone information, if present, is also sent in the 
The partial encryption of tokenized documents technique clear (i.e., the contone information is not encrypted). If the 
of the present invention provides a fairly high level of principal value of the document is a collection of images 
security that protects unauthorized copying of a secured 25 represented in the contone information, then sending con- 
document. However, this technique does provide the tone information in the clear is not an appropriate technique, 
untrusted user access to some of the information contained Sending the contone information in the clear is an attempt to 
in the secured document. The dictionaries 120 (FIG. 1), prevent access to the textual portion of the document, and 
which are sent in the clear,, contain only shapes. No position thus assumes that the text is the valuable portion of the 
information that identifies where the shape occurs in the 30 document, 
document is provided with the dictionaries. Thus, the ability Secured Document Distribution: 

to read the dictionaries only allows the untrusted user to see FIG. 3 illustrates a network for transmission of a secured 

what shapes are used in the documents. This is equivalent to document for use with the present invention. Documents 

knowing what fonts were used to construct the documents. 180 are stored in a document repository 310. Document 

Some documents may include a shape that appears only 35 repository 310, depicted as a large storage medium in FIG. 

once in the document. For example, a line drawing unique 3, represents a broad category of storage devices suitable for 

to a particular document may occur only a single time in the storing a repository of documents. The documents are stored 

document. Thus, these shapes are visible to the untrusted in the document repository 310 as "secured documents" 180 

user in the dictionaries. In one embodiment, to protect the (i.e., the documents are processed to include partial encryp- 

content information of the character, the character is 40 tion of tokenized documents). For the example document 

encrypted. For this embodiment, additional processing is distribution system shown in FIG. 3, the document reposi- 

required to decrypt and to uncover the secured document. In tory 310 is coupled to a wide area network 320. Wide area 

another embodiment, the unique shape is partitioned into network 320 represents a broad category of transmission 

arbitrary sub-shapes. For example, if the unique shape is a medium used to distribute information globally. Local area 

line, then the line is divided into smaller line segments. For 45 network 330 is also shown on FIG. 3. Local area network 

this embodiment, the original shape is reconstructed in the 330 couples a plurality of personal computers, labeled 340. 

document reconstruction system (FIG. 5). The sub-shapes A trusted printer 400 is also coupled to the local area 

stored in the dictionaries contain little information. This network 330. For purposes of illustration, the personal 

technique of dividing original shapes into sub-shapes computers 340 transmit electronic documents to trusted 

increases the compression ratio in the compressor 130 (FIG. 50 printer 400 to obtain paper printed images of documents. 

1) because multiple triples must be used to instruct the For the example network shown in FIG. 3, secured 

document reconstruction system (FIG. 5) on how and where documents 180 are transmitted to one or more personal 

to reconstruct the sub-shapes. However, this technique does computers 340 via wide area network 320 and local area 

not increase the amount of processing required to encrypt network 330. The secured documents 180 may be stored on 

the token IDs. 55 a server for the local area network 330 or any one or more 

The position information (i.e., x, y) in the position block of the personal computers 340. At this point, the secured 

140 is not enough information to reconstruct the content of document cannot be viewed electronically by a user of a 

the document. The information encoded in the differential personal computer 340, Thus, even if a user makes multiple 

encoder 160 (FIG. 1) is a sequence of inter-character spaces copies of a document in an electronic form, the secured 

from which the untrusted user could gain little information. 60 document is not useful to the user. 

However, it is possible to extract, to some degree of To view the content of the secured document in the 

accuracy, the word length and to identify characters that example of FIG. 3, the user transmits the secured documents 

have descenders. The ability to extract this information may 180 to the trusted printer 400. The document reconstruction 

be reduced by using a more sophisticated position block system (FIG. 5) is implemented in the trusted printer 400. As 

encoder (i.e., more sophisticated than the differential coding 65 such, the trusted printer 400 decompresses and decrypts the 

technique). An untrusted user could, using natural language secured document 180 and prints a document image on 

models and having an expectation of the document contents, paper. The user may then view the document as a hard copy. 
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Document Reconstruction: As shown on FIG. 6, the processing to decrypt the token 

FIG. 4 is a block diagram illustrating one embodiment for IDs and to render the page images are successive. This 

a document reconstruction system for use with the partial allows implementation of both the decryption and page 

encryption of tokenized documents technique. A document rendering functions in the single integrated circuit 410. 

reconstruction system 500 receives, as an input, secured 5 Integrating the decryption and page rendering functions in a 

documents 180, and generates, as an output, a document single integrated circuit 410 makes access to the decrypted 

image 560. As described below, the document image 560 token ID stream extremely difficult. For example, an 

may be a hard copy printed on paper (See FIGS. 3 and 6). untrusted user tampering with the trusted image output 

As discussed above, the secured documents 180 include the terminal in an attempt to capture the decrypted document in 

encrypted token IDs, positions and dictionaries 120. The 10 electronic form would have to access internal data paths in 

encrypted token IDs 170 are input to decryptor 510. The the single integrated circuit 410. This task is extremely 

decryptor 510 decrypts the token IDs in accordance with the difficult. As shown by the lines in FIG. 6, the untrusted user 

encryption scheme used in the encryptor 150 (FIG. 1). Also, may capture the partially encrypted data stream external to 

as shown in FIG. 4, decryptor 510 receives a private/public the single integrated circuit chip 410. However, the user 

key from an individual authorized to view the document. A 15 already has access to the partially encrypted tokenized 

decoder, labeled 550 on FIG. 4, decodes the positions (x, y) document. On the output of the integrated circuit chip 410, 

on a bit level. A differential decoder 520 receives the an untrusted user could capture the page image. However, 

positions from the decoder 550, and differentially decodes capturing the page image is equivalent to a noise-free 

the positions. A bit level decoder 555 decodes the token IDs scanning of the paper that the printer produces. At the output 

in accordance with the encoding 135 (FIG. 1) scheme. As 20 stage of the single integrated circuit, printing of the paper 

shown in FIG. 4, the position block is re-combined into a image is imminent. Based on the trusted image output 

stream of triples 530 with the decoded position (x, y) and the terminal 400 of FIG. 6 and a document distribution network 

decrypted token ID. The stream of triples 530 is then input example of FIG. 3, an untrusted user cannot gain access to 

to decompressor 540. Decompressor 540 receives dictionar- the decrypted electronic version of the document, and there - 

ies 120 to generate shapes designated by token IDs at the 25 fore cannot make unauthorized copies of the electronic 

corresponding positions (x, y). The output of document version of the document, 

reconstruction system 500, in a human readable form, is Computer System: 

shown as document image 560. FIG. 7 illustrates a high level block diagram of a general 

FIG. 5 is a flow diagram illustrating one embodiment for purpose computer system in which the partial encryption of 

document reconstruction with partially encrypted tokenized 30 tokenized documents technique of the present invention may 

documents. The secured document is divided into positions be implemented. A computer system 1000 contains a pro- 

(x, y) and token IDs for each of "n" tokens in the document cessor unit 1005, main memory 1010, and an interconnect 

(block 600). With user provided keys, the token IDs are bus 1025. The processor unit 1005 may contain a single 

decrypted (block 610). The positions (x, y) are differentially microprocessor, or may contain a plurality of microproces- 

decoded asset forth in the algorithm in blocks 630, 635, 640, 35 sors for configuring the computer system 1000 as a multi- 

645, 650, 660, 665, 670 and 675. Although differential processor system. The main memory 1010 stores, in part, 

encoding/decoding is described herein, any technique that instructions and data for execution by the processor unit 

reduces the ability to derive the content information from the 1005. If the partial encryption of tokenized documents 

position information and dictionaries may be used without technique of the present invention is wholly or partially 

deviating from the spirit or scope of the invention. The 40 implemented in software, the main memory 1010 stores the 

positions (x, y) and corresponding token IDs are recombined executable code when in operation. The main memory 1010 

(block 680). Shapes from the dictionaries, identified by the may include banks of dynamic random access memory 

token IDs, are generated at the corresponding positions (DRAM) as well as high speed cache memory, 
(block 685). The document is printed on paper to provide a The computer system 1000 further includes a mass stor- 

hard copy of the document (block 690). 45 age device 1020, peripheral device(s) 1030, portable storage 

FIG. 6 is a block diagram illustrating one embodiment for medium drive(s) 1040, input control device(s) 1070, a 

a trusted image output terminal for use with the partial graphics subsystem 1050, and an output display 1060. For 

encrypting of tokenized documents technique. A trusted purposes of simplicity, all components in the computer 

image output terminal 400 includes, in part, a document system 1000 are shown in FIG. 7 as being connected via the 

reconstruction system implemented on a single integrated 50 bus 1025. However, the computer system 1000 may be 

circuit 410. In one embodiment, the trusted image output connected through one or more data transport means. For 

terminal 400 is a laser printer; however, any printing tech- example, the processor unit 1005 and the main memory 

nology may be used in conjunction with the techniques of 1010 may be connected via a local microprocessor bus, and 

the present invention. As shown in FIG. 6, the single the mass storage device 1020, peripheral device(s) 1030, 

integrated circuit 410 includes decryption 420 and page 55 portable storage medium drive(s) 1040, graphics subsystem 

rendering 430. In general, the decryption 420 receives the 1050 may be connected via one or more input/output (I/O) 

encrypted token IDs and public/private keys, and generates busses. The mass storage device 1020, which may be 

decrypted token IDs. The token IDs, dictionaries, and posi- implemented with a magnetic disk drive or an optical disk 

tions (x, y) are input to page rendering 430. Page rendering drive, is a non-volatile storage device for storing data and 

430 decodes the positions (x, y), and decompresses the 60 instructions for use by the processor unit 1005. In the 

positions (x, y) and token IDs to generate a page image. The software embodiment, the mass storage device 1020 stores 

page image is the output of the integrated circuit 410. A the partial encryption of tokenized documents software for 

raster output subsystem 440 (laser) receives the page image, loading to the main memory 1010. 

and generates the page image on paper to produce a hard The portable storage medium drive 1040 operates in 

copy of the document. In one embodiment, the single 65 conjunction with a portable non-volatile storage medium, 

integrated circuit 410 is implemented with an application such as a floppy disk or a compact disc read only memory 

specific integrated circuit chip (ASIC). (CD-ROM), to input and output data and code to and from 



05/12/2004, EAST Version: 1.4.1 



US 6,449,718 Bl 



10 



the computer system 1000. In one embodiment, the partial 
encryption of tokenized documents software is stored on 
such a portable medium, and is input to the computer system 
1000 via the portable storage medium drive 1040. The 
peripheral device(s) 1030 may include any type of computer 
support device, such as an input/output (I/O) interface, to 
add additional functionality to the computer system 1000. 
For example, the peripheral device(s) 1030 may include a 
network interface card for interfacing the computer system 
1000 to a network. For the software implementation, docu- 
ments may be input to the computer system 1000 via a 
portable storage medium or a network for processing by the 
partial encryption of tokenized documents software. 

The input control device(s) 1070 provide a portion of the 
user interface for a user of the computer system 1000. The 
input control device(s) 1070 may include an alphanumeric 
keypad for inputting alphanumeric and other key 
information, a cursor control device, such as a mouse, a 
trackball, stylus, or cursor direction keys. In order to display 
textual and graphical information, the computer system 
1000 contains the graphics subsystem 1050 and the output 
display 1060. The output display 1060 may include a 
cathode ray tube (CRT) display or liquid crystal display 
(LCD). The graphics subsystem 1050 receives textual and 
graphical information, and processes the information for 
output to the output display 1060. The components con- 
tained in the computer system 1000 are those typically found 
in general purpose computer systems, and in fact, these 
components are intended to represent a broad category of 
such computer components that are well known in the art. 

The partial encryption of tokenized document techniques, 
including the document generation system and the document 
reconstruction system may be implemented in either hard- 
ware or software. For the software implementation, the 
document generation and document reconstruction systems 
are software that includes a plurality of computer executable 
instructions for implementation on a general purpose com- 
puter system. Prior to loading into a general purpose com- 
puter system, the document generation and document recon- 
struction system software may reside as encoded 
information on a computer readable medium, such as a 
magnetic floppy disk, magnetic tape, and compact disc read 
only memory (CD-ROM). In one hardware implementation, 
the document generation and document reconstruction sys- 
tems may comprise circuits, implemented on a single inte- 
grated circuit as shown in FIG. 6. 

Although the present invention has been described in 
terms of specific exemplary embodiments, it will be appre- 
ciated that various modifications and alterations might be 
made by those skilled in the art without departing from the 
spirit and scope of the invention. 

What is claimed is: 

1. A method for partially encrypting a tokenized 
document, said method comprising the steps of: 

generating, from a document, a document image; 

storing a plurality of tokens for use as at least one 
dictionary, wherein a token represents a shape; 

generating, from said document image, a plurality of 
triples comprising a token identification (ID) and a 
corresponding position in said document image, 
wherein said token ID identifies a token from said 
dictionary that corresponds to a shape in said document 
image at said corresponding position; and 

encrypting said token IDs, wherein an output representa- 
tion for said document comprises encrypted token IDs, 
positions, and a dictionary of tokens. 

2. The method as set forth in claim 1, further comprising 
the step of encoding said positions so as to reduce the 
information between two successive positions in said output 
representation. 
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3. The method as set forth in claim 2, wherein the step of 
encoding said positions comprises the step of differentially 
encoding information between two successive positions. 

4. The method as set forth in claim 1, further comprising 
the step of transmitting said output representation compris- 
ing said encrypted token IDs, said positions, and said 
dictionary of tokens. 

5. The method as set forth in claim 1, further comprising 
the step of assigning each position with a color designation 
to include color in said output representation of said docu- 
ment, 

6. The method as set forth in claim 1, further comprising 
the step of augmenting each position in said sequence with 
JPEG compressed images to include contone color in said 
output representation. 

7. A method for reconstructing a tokenized document, said 
method comprising the steps of: 

receiving at least one dictionary comprising a plurality of 
tokens, wherein a token represents a shape; 

receiving a representation for a document comprising a 
plurality of encrypted token identifications (IDs) and 
corresponding positions in said document image, such 
that a token ID identifies a token from said dictionary 
that corresponds to a shape in said document image at 
a corresponding position; 

decrypting said encrypted token IDs to generate token 
IDs; and 

generating a document image by generating a shape 
corresponding to a shape in said dictionary identified 
by a token ID at a location identified by a correspond- 
ing position. 

8. The method as set forth in claim 7, further comprising 
the steps of: 

receiving encoded positions; and 
decoding said encoded positions. 

9. The method as set forth in claim 8, wherein the step of 
decoding said encoded positions comprises the step of 
differentially decoding two successive positions to obtain 
absolute coordinates for said positions. 

10. The method as set forth in claim 7, further comprising 
the steps of: 

receiving a color designation for at least one position; and 
generating color at said position in accordance with said 
color designation. 

11. The method as set forth in claim 7, further comprising 
the steps of: 

receiving contone color for at least one position; and 
generating for said position contone color in said docu- 
ment image. 

12. An image output terminal for printing a document 
image comprising: 

decryption circuitry for receiving a plurality of encrypted 
token identifications (IDs) and for decrypting said 
encrypted token IDs to generate token IDs; 

page rendering circuitry for receiving positions corre- 
sponding to said token IDs and at least one dictionary 
comprising a plurality of tokens that represent shapes, 
said page rendering circuitry being coupled to said 
decryption circuitry for receiving said token IDs, and 
for generating a page image from said token IDs, 
dictionary and positions through identification of 
tokens from said dictionary that corresponds to a shape 
in said document image at said corresponding position; 
and 

a raster output subsystem coupled to said page rendering 
circuitry for receiving said page image and for gener- 
ating a paper printed image. 
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13. The image output terminal as set forth in claim 12, 
wherein said decryption circuitry and said page rendering 
circuitry comprise a single integrated circuit device. 

14. A computer readable medium comprising a set of 
instructions stored therein, which when executed by 
computer, causes the computer to perform the steps of: 

generating, from a document, a document image; 

storing a plurality of tokens for use as at least one 
dictionary, wherein a token represents a shape; 

generating, from said document image, a plurality of 
triples comprising a token identification (ID) and a 
corresponding position in said document image, 
wherein said token ID identifies a token from said 
dictionary that corresponds to a shape in said document 
image at said corresponding position; and 

encrypting said token IDs, wherein an output representa- 
tion for said document comprises encrypted token IDs, 
positions, and a dictionary of tokens. 

15. The computer readable medium as set forth in claim 

14, further comprising the step of assigning each position 
with a color designation to include color in said output 
representation of said document. 

16. The computer readable medium as set forth in claim 

15, wherein the step of encoding said positions comprises 
the step of differentially encoding information between two 
successive positions. 

17. The computer readable medium as set forth in claim 
14, further comprising the step of transmitting said output 
representation comprising said encrypted token IDs, said 
positions, and said dictionary of tokens. 

18. The computer readable medium as set forth in claim 
14, further comprising the step of assigning each position 
with a color designation to include color in said output 
representation of said document. 

19. The computer readable medium as set forth in claim 
14, further comprising the step of augmenting each position 
in said sequence with JPEG compressed images to include 
contone color in said output representation. 
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20. A computer readable medium comprising a set of 
instructions stored therein, which when executed by a 
computer, causes the computer to perform the steps of: 

receiving at least one dictionary comprising a plurality of 
tokens, wherein a token represents a shape; 

receiving a representation for a document comprising a 
plurality of encrypted token identifications (IDs) and 
corresponding positions in said document image, such 
that a token ID identifies a token from said dictionary 
that corresponds to a shape in said document image at 
a corresponding position; 

decrypting said encrypted token IDs to generate token 
IDs; and 

generating a document image by generating a shape 
corresponding to a shape in said dictionary identified 
by a token ID at a location identified by a correspond- 
ing position. 

21. The computer readable medium as set forth in claim 

20, further comprising the steps of: 
receiving encoded positions; and 
decoding said encoded positions. 

22. The computer readable medium as set forth in claim 

21, wherein the step of decoding said encoded positions 
comprises the step of differentially decoding two successive 
positions to obtain absolute coordinates for said positions. 

23. The computer readable medium as set forth in claim 
20, further comprising the steps of: 

receiving a color designation for at least one position; and 
generating color at said position in accordance with said 
color designation. 

24. The computer readable medium as set forth in claim 
20, further comprising the steps of: 

receiving contone color for at least one position; and 
generating for said position contone color in said docu- 
ment image. 
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